Average Profile of the Lempel - Ziv Parsing Scheme for Amarkovian
نویسندگان
چکیده
For a Markovian source, we analyze the Lempel-Ziv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest preex of the ith sequence that was not seen before as a phrase (i.e., a preex of previous (i ? 1) sequences). In the other two models, only a single sequence is generated by a Markovian source. In the second model, for which we coin the name Gilbert-Kadota model, a xed number of phrases is generated according to the Lempel-Ziv algorithm, thus producing a sequence of a variable (random) length. In the last model, known also as the Lempel-Ziv model, a string of xed length is partitioned into a variable (random) number of phrases. These three models can be eeciently represented and analyzed by digital search trees that are of interest to other algorithms such as sorting, searching and pattern matching. In this paper, we concentrate on analyzing the average proole (i.e., the average number of phrases of a given length), the typical phrase length, and the length of the last phrase. We obtain asymptotic expansions for the mean and the variance of the phrase length, and we prove that appropriately normalized phrase length in all three models tends to the standard normal distribution which lead to bounds on the average redundancy of the Lempel-Ziv code. For Markov Independent model, this nding is established by analytic methods (i.e., generating functions, Mellin transform and depoissonization), while for the other two models we use a combination of analytic and probabilistic analyses.
منابع مشابه
Average Profile of the Lempel-Ziv Parsing Scheme for Markovian Source
Jing Tang Microsoft Corporation One Microsoft Way, 1/2061 Redmond, WA 98052 U.S.A. [email protected] For a Markovian source, we analyze the Lempel-Ziv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sou...
متن کاملOn Generalized Digital Search Trees with Applicationsto a Generalized Lempel - Ziv
The goal of this research is twofold: (i) to analyze generalized digital search trees, and (ii) to derive the average proole (i.e., phrase length) of a generalization of the well known parsing algorithm due to Lempel and Ziv. In the generalized Lempel-Ziv parsing scheme, one partitions a sequence of symbols from a nite alphabet into phrases such that the new phrase is the longest substring seen...
متن کاملUniversal coding of nonstationary sources
In this correspondence we investigate the performance of the Lempel–Ziv incremental parsing scheme on nonstationary sources. We show that it achieves the best rate achievable by a finite-state block coder for the nonstationary source. We also show a similar result for a lossy coding scheme given by Yang and Kieffer which uses a Lempel–Ziv scheme to perform lossy coding.
متن کاملLempel-Ziv Dimension for Lempel-Ziv Compression
This paper describes the Lempel-Ziv dimension (Hausdorff like dimension inspired in the LZ78 parsing), its fundamental properties and relation with Hausdorff dimension. It is shown that in the case of individual infinite sequences, the Lempel-Ziv dimension matches with the asymptotical Lempel-Ziv compression ratio. This fact is used to describe results on Lempel-Ziv compression in terms of dime...
متن کاملOn the optimality of parsing in dynamic dictionary based data compression preliminary version
Since the introduction of dynamic dictionary based data compression by Ziv and Lempel two decades ago many dictionary construction schemes have been proposed and implemented This paper considers the following question once a dynamic dictionary construction scheme is selected is there an e cient dynamic parsing method that results with the smallest number of phrases possible for the selected sch...
متن کامل